fix: remove deprecated code_mapping, dev, refresh_cache from examples and README by haoyu-haoyu · Pull Request #935 · sunlabuiuc/PyHealth

haoyu-haoyu · 2026-04-03T08:02:58Z

Summary

Remove references to deprecated code_mapping, dev, and refresh_cache parameters from example scripts and README. These parameters belonged to the legacy BaseEHRDataset API and are no longer accepted by the v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset).

Files updated

File	Change
`README.rst`	Remove `code_mapping={"NDC": "CCSCM"}` from quickstart example
`examples/mortality_prediction/mortality_mimic3_grasp.py`	Remove `code_mapping`, `dev`, `refresh_cache`
`examples/drug_recommendation/drug_recommendation_mimic4_gamenet.py`	Remove `code_mapping`, `dev`, `refresh_cache`
`examples/patient_linkage_mimic3_medlink.py`	Remove `code_mapping`, `dev`, `refresh_cache`
`leaderboard/utils.py`	Remove `code_mapping`, `dev`, `refresh_cache` from MIMIC3/MIMIC4 loaders

Out of scope (for follow-up PRs)

Task file docstrings (pyhealth/tasks/*.py) — these contain code_mapping in >>> doctest examples that also need updating
pyhealth/datasets/mimicextract.py — still uses the legacy API (not yet migrated to v2.0)
chat-assistant/corpus/ — auto-generated text corpus, will update when source docs are fixed

Fixes #535

The v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset) no longer accepts code_mapping, dev, or refresh_cache parameters. These were part of the legacy BaseEHRDataset API. Update README.rst, example scripts, and leaderboard utilities to use the current v2.0 API. Note: task file docstrings and pyhealth/datasets/mimicextract.py still reference code_mapping but are left for separate PRs since mimicextract.py has not yet been migrated to v2.0. Fixes sunlabuiuc#535

jhnwu3

lgtm

* Update/core docs (sunlabuiuc#889) * add new docs * index * overview page added * clean up and fix old details * [Conformal EEG] TUEV/TUAB Compatibility (sunlabuiuc#894) * Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * Updated Conformal Test Scripts (sunlabuiuc#895) * Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * test script fixes * fix: prevent batch_size=1 crashes, add weights_only to torch.load, fix device/contiguity issues (sunlabuiuc#901) 1. Fix bare .squeeze() calls that silently remove the batch dimension when batch_size=1, causing wrong results during single-sample inference: - concare.py: .squeeze() → .squeeze(dim=-1) and .squeeze(dim=1) - agent.py: .squeeze() → .squeeze(dim=-1) or removed (already 1-D after .sum/.mean) 2. Add weights_only=True to all torch.load() calls for PyTorch 2.6+ compatibility and security (prevents arbitrary code execution via pickle deserialization): - trainer.py, biot.py, tfm_tokenizer.py (2 calls), kg_base.py 3. Add .contiguous() before pack_padded_sequence in RNNLayer to prevent cuDNN errors with non-contiguous input tensors (fixes sunlabuiuc#800) 4. Fix StageNet device mismatch — tensors were created on CPU instead of the input tensor's device, causing crashes during GPU training: - torch.zeros/ones(...) → torch.zeros/ones(..., device=device) - time == None → time is None (PEP8) * fix: improve research reliability — metrics mutation, eval placement, reproducible splits (sunlabuiuc#902) Three fixes that directly affect the trustworthiness of research results: 1. regression.py: kl_divergence computation mutated the input arrays (x, x_rec) in-place via clamping and normalization. When multiple metrics were requested (e.g., ["kl_divergence", "mse", "mae"]), mse/mae were computed on the modified arrays, producing incorrect values. Fixed by operating on copies. 2. trainer.py: model.eval() was called inside the per-batch loop in inference(), redundantly setting eval mode on every batch. Moved to before the loop — called once as intended. 3. splitter.py: all split functions used np.random.seed() which mutates the global numpy random state. This causes cross-contamination when multiple splits are called sequentially, making experiments non-reproducible. Replaced all 7 occurrences with np.random.default_rng(seed) which creates an isolated RNG instance. The existing sample_balanced() already used default_rng correctly. * fix: port GRASP model to PyHealth 2.0 API (fixes sunlabuiuc#891) (sunlabuiuc#903) The GRASP model was completely non-functional in PyHealth 2.0 because it still used the legacy 1.x BaseModel constructor and removed helper methods (get_label_tokenizer, add_feature_transform_layer, prepare_labels, padding2d/3d). Changes: - Rewrite GRASP.__init__ to use the 2.0 pattern (matching ConCare): - super().__init__(dataset=dataset) instead of passing feature_keys/label_key/mode - EmbeddingModel(dataset, embedding_dim) replaces manual type dispatch - self.get_output_size() without arguments - Auto-derive feature_keys, label_key, mode from dataset schemas - Rewrite GRASP.forward to use EmbeddingModel: - embedded, masks = self.embedding_model(kwargs, output_mask=True) - Labels from kwargs[self.label_key].to(self.device) - Eliminates ~60 lines of manual tokenization/padding/embedding - Remove eliminated parameters: feature_keys, label_key, mode, use_embedding - Update imports: SampleEHRDataset → SampleDataset, add EmbeddingModel - Update docstring examples to 2.0 API - Update __main__ block to use create_sample_dataset - Add tests/core/test_grasp.py with 8 test cases covering: initialization, forward/backward, embed extraction, GRU/LSTM backbones GRASPLayer (the algorithm core) is unchanged. * making the PyHealth Research Initiative page way less confusing and dense (sunlabuiuc#907) just doc things * add new reference to the top of the pyhealth page for our new project page so users who join can hopefully find a more easy to navigate page that isn't so documentation heavy to find what they're looking for (sunlabuiuc#910) * [Conformal EEG] Conformal Testing Fixes (sunlabuiuc#909) * Fixed repo to be able to run TUEV/TUAB + updated example scripts * Args need to be passed correctly * Minor fixes and precomputed STFT logic * Fix the test files to reflect codebase changes * Args update * test script fixes * dataset path update * fix contrawr - small change * divide by 0 error * Incorporate tfm logic * Fix label stuff * tuab fixes * fix metrics * aggregate alphas * Fix splitting and add tfm weights * fix tfm+tuab * updates scripts and haoyu splitter * fix conflict * Remove weightfiles from tracking and add to .gitignore Weight files are large binaries distributed separately; untrack all existing .pth files under weightfiles/ and add weightfiles/ to .gitignore so they are excluded from future commits and the PR. Made-with: Cursor * feat: add optional dependency groups for graph and NLP extras (sunlabuiuc#904) * feat: add optional dependency groups for graph and NLP extras (sunlabuiuc#890) Add [project.optional-dependencies] to pyproject.toml so users can install domain-specific dependencies via pip extras: pip install pyhealth[graph] # torch-geometric for GraphCare, KG pip install pyhealth[nlp] # editdistance, rouge_score, nltk The codebase already uses try/except ImportError with HAS_PYG flags for torch-geometric, and the NLP metrics define their required versions in each scorer class. This change exposes those dependencies through standard Python packaging so pip can resolve them. Version pins match the requirements declared in the code: - editdistance~=0.8.1 (pyhealth/nlp/metrics.py:356) - rouge_score~=0.1.2 (pyhealth/nlp/metrics.py:415) - nltk~=3.9.1 (pyhealth/nlp/metrics.py:397) - torch-geometric>=2.6.0 (compatible with PyTorch 2.7) Closes sunlabuiuc#890 * fix: move optional-dependencies after scalar fields to fix TOML structure Move [project.optional-dependencies] from between dependencies and license (line 49) to after keywords (line 62), before [project.urls]. In TOML, a sub-table header like [project.optional-dependencies] closes the parent [project] table, so placing it before license and keywords caused those fields to be excluded from [project]. This broke CI validation. Verified with tomllib that all project fields (name, license, keywords, optional-dependencies, urls) parse correctly under [project]. * Add/mm retain adacare (sunlabuiuc#885) * init commit * RNN memory fix * add example scripts here * more bug fixes? * commit to see new changes * add test cases * fix basemodel leakage of args * fixes to tests and examples * more examples * reduce unnecessary checks, enable crashing on when a cache is invalid * fix nested sequence rnn problems * fixes for the concare and transformer model exploding in memory * fix concare merge conflict again * fix for 3D channel for CNN * update and delete defunct docs * better loc comparisons and also a bunch of model fixes hopefully * test case updates to match our bug fixes * fix instability in calibration tests for CP tldr; Fixes a variety of dataset loading, run bugs, splits for TUEV/TUAB, adds a good number of performance fixes for Transformer and Concare. We can always iterate on our fixes later. * concare fix (sunlabuiuc#920) Bypassing a PR review, because of speed/reviewer bottleneck reasons. * fix pixi warning and version format for backend (sunlabuiuc#917) * fix: remove deprecated code_mapping, dev, refresh_cache from examples (sunlabuiuc#935) The v2.0 MIMIC3Dataset/MIMIC4Dataset (based on BaseDataset) no longer accepts code_mapping, dev, or refresh_cache parameters. These were part of the legacy BaseEHRDataset API. Update README.rst, example scripts, and leaderboard utilities to use the current v2.0 API. Note: task file docstrings and pyhealth/datasets/mimicextract.py still reference code_mapping but are left for separate PRs since mimicextract.py has not yet been migrated to v2.0. Fixes sunlabuiuc#535 --------- Co-authored-by: John Wu <54558896+jhnwu3@users.noreply.github.com> Co-authored-by: Arjun Chatterjee <arj0jeechat@gmail.com> Co-authored-by: haoyu-haoyu <85037553+haoyu-haoyu@users.noreply.github.com> Co-authored-by: Paul Landes <landes@mailc.net>

jhnwu3 approved these changes Apr 8, 2026

View reviewed changes

jhnwu3 merged commit d7641e0 into sunlabuiuc:master Apr 8, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove deprecated code_mapping, dev, refresh_cache from examples and README#935

fix: remove deprecated code_mapping, dev, refresh_cache from examples and README#935
jhnwu3 merged 1 commit intosunlabuiuc:masterfrom
haoyu-haoyu:fix/deprecated-code-mapping

haoyu-haoyu commented Apr 3, 2026

Uh oh!

jhnwu3 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

haoyu-haoyu commented Apr 3, 2026

Summary

Files updated

Out of scope (for follow-up PRs)

Uh oh!

jhnwu3 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants